Take-home Exercise 1: Application of Spatial Point Patterns Analysis – Geographical Distribution of Grab hailing services (Singapore)

Author

Victoria Grace ANN

Published

January 22, 2024

Modified

January 29, 2024

Context

Introduction

Human mobility, the movement of people in space and time, reflects the spatial-temporal characteristics of human behaviour (Wang et al., 2021). With the advancement Information and Communication Technologies (ICT), pertinent through the ubiquitous use of smartphones, a large and growing volume of data relating to human mobility is available today. By using appropriate GIS analysis methods, such data are potentially useful in supporting smart-city planning and management.

Data

In 2020, an interesting human mobility data set called Grab Posisi was released by Grab, one of the largest shared taxi operators in Southeast Asia. One of the two data sets released is soley focusing on Singapore!

In Singapore, relevant human mobility data can be extracted from Land Transport Authority (LTA) DataMall. In particular, two data sets related to human mobility are provided by the portal: Passenger Volume by Origin Destination Train Stations and Passenger Volume by Origin Destination Bus Stops.

Limitation

A limitation of the LTA data sets is that their locations are biased to either bus stops or MRT/LRT stations. Hence, there will naturally be clusters of embarking and disembarking passengers at stops and stations, and other interesting clusters by other modes of transport are overlooked.

Importing Packages

The following R packages are used for this assignment:

  • arrow, to read and write Parquet files (format in which data is in)

  • lubridate, to work with time-related data more easily

  • tidyverse, for importing, wrangling and visualising data

  • tmap, to create thematic maps

  • sf, for importing, managing and processing geospatial data

  • spatstat, for point pattern analysis

  • raster, reads, writes, manipulates, analyses and model of gridded spatial data (raster)

  • spNetwork, to perform kernel density estimation and build spatial matrices to conduct spatial analysis on reticular distances

pacman::p_load(arrow, lubridate, tidyverse, tmap, sf, spatstat, raster, spNetwork,sp)

Data

Geospatial Data

Data preprocessing

Master Plan 2019 Subzone Boundary

First, we want to get a clear Singapore boundary layer.

Extracted from data.gov.sg, let’s read the 2019 master plan subzone boundary layer.

mpsz <- st_read("data/geospatial/master-plan-2019-subzone-boundary-no-sea-kml.kml")
summary(mpsz)
  • The dimension contains XYZ, meaning that mpsz contains 3D geometries

Convert mpsz to 2D geometry using st_zm() from sf and save the converted form as a shapefile.

mpsz <- st_zm(mpsz, drop = TRUE) # Convert 3D geometries to 2D

Create a new column SUBZONE_N containing the subzone names.

# create a new column 'SUBZONE_N' and extract subzone names from 'Dscrptn' field
mpsz <- mpsz %>%
  rowwise() %>%
  mutate(SUBZONE_N = str_extract(Description, "<th>SUBZONE_N</th> <td>(.*?)</td>")) %>%
  ungroup()

# remove HTML tags and 'SUBZONE_N' from 'SUBZONE_N' column
mpsz$SUBZONE_N <- str_remove_all(mpsz$SUBZONE_N, "<.*?>|SUBZONE_N")

# view the updated 'mpsz3414' dataframe
head(mpsz$SUBZONE_N)

Create a new column PLN_AREA_N containing the planning area names.

# create a new column 'PLN_AREA_N' and extract subzone names from 'Dscrptn' field
mpsz <- mpsz %>%
  rowwise() %>%
  mutate(PLN_AREA_N = str_extract(Description, "<th>PLN_AREA_N</th> <td>(.*?)</td>")) %>%
  ungroup()

# remove HTML tags and 'PLN_AREA_N' from 'PLN_AREA_N' column
mpsz$PLN_AREA_N <- str_remove_all(mpsz$PLN_AREA_N, "<.*?>|PLN_AREA_N")
# remove the 'Description' column using the $ operator
mpsz$Description <- NULL
#st_write(mpsz, "data/geospatial/mpsz.shp")
mpsz_sf <- st_read("data/geospatial/mpsz.shp")
Reading layer `mpsz' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial\mpsz.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
class(mpsz_sf)
[1] "sf"         "data.frame"

Then transform the CRS from WGS84 to SVY21.

mpsz3414 <- st_transform(mpsz_sf, 3414)
class(mpsz3414)
[1] "sf"         "data.frame"

No commercial ride hailing services are available in the outer islands of Singapore. Let’s identify them first.

Outer Islands Subzones Viewed with QGIS

Filter out the outer islands.

outer_islands <- c("SEMAKAU", "SUDONG", "NORTH-EASTERN ISLANDS", "SOUTHERN GROUP")

# remove rows where 'SUBZONE_N' is in the list
mpsz3414 <- mpsz3414 %>%
  filter(!str_trim(SUBZONE_N) %in% str_trim(outer_islands))
class(mpsz3414)
[1] "sf"         "data.frame"

Then dissolve all the inner subzone boundaries using st_union() (if necessary).

sg_sf <- mpsz3414 %>% st_union()
plot(sg_sf)

OSM Singapore Roads

Get the roads layer from OSM. Since the file is very large, it takes a long time to get the intersection between Singapore and roads (I took about 7 minutes to execute this step).

roads <- st_read(dsn="data/geospatial", layer="gis_osm_roads_free_1") %>%
  st_transform(crs=3414)

sg_roads <- st_intersection(roads,sg_sf)

plot(sg_roads) 

Alternatively you may use QGIS to select the roads in Singapore and save the roads into a shape file.

The following two code chunks can be uncommented if you are taking this approach.

sg_roads <- st_read(dsn= "data/geospatial", layer="sg_roads")
str(sg_roads)

Verify that roads are in the correct in CRS

st_crs(sg_roads) <- st_crs(3414)
st_crs(sg_roads)
  • The CRS ID is 3414 which is correct.

Data Size Management

Before further data manipulation, save the files in rds format as the size of roads3414 is very large.

write_rds(sg_roads,"data/geospatial/rds/sg_roads.rds")

Read the written roads file.

sg_roads <- read_rds("data/geospatial/rds/sg_roads.rds")

Aspatial Data

Data preprocessing

Read all the Grab-Posisi using arrow’s read_parquet() function to read the parquet files containing the ride trajectories.

df0 <- read_parquet("data/aspatial/part-00000-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df1 <- read_parquet("data/aspatial/part-00001-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df2 <- read_parquet("data/aspatial/part-00002-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df3 <- read_parquet("data/aspatial/part-00003-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df4 <- read_parquet("data/aspatial/part-00004-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df5 <- read_parquet("data/aspatial/part-00005-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df6 <- read_parquet("data/aspatial/part-00006-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df7 <- read_parquet("data/aspatial/part-00007-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df8 <- read_parquet("data/aspatial/part-00008-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
df9 <- read_parquet("data/aspatial/part-00009-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")

Then combine the rows using the bindrows function in the dplyr package embedded in the tidyverse package.

df <- bind_rows(df0, df1, df2, df3, df4, df5, df6, df7, df8, df9)
head(df)

The observed pingtimestamp is not easily comprehensible, thus it needs to be amended to a proper timestamp.

df$pingtimestamp <- as_datetime(df$pingtimestamp) ## $ to overwrite the variable in df
head(df$pingtimestamp)

The data can be further segmented to two groups:

  • origin_df representing the starting locations of trips taken

  • dest_df representing the ending locations of trips taken

Starting Locations

origin_df <- df %>%
  group_by(trj_id) %>%
  arrange(pingtimestamp) %>% 
  filter(row_number()==1) %>% 
  mutate(weekday = wday(pingtimestamp, 
                        label=TRUE,
                        abbr=TRUE), 
         start_hr = factor(hour(pingtimestamp)), 
         day = factor(mday(pingtimestamp)))

Ending Locations

dest_df <- df %>%
  group_by(trj_id) %>%
  arrange(desc(pingtimestamp)) %>% 
  filter(row_number()==1) %>% 
  mutate(weekday = wday(pingtimestamp, 
                        label=TRUE,
                        abbr=TRUE), 
       end_hr = factor(hour(pingtimestamp)), 
         day = factor(mday(pingtimestamp)))

Data Size Management

Before further data manipulation, save the files in rds format as the size of the data frames are very large. Running the whole tibble data frame will not be practical and the processing time will be very long.

write_rds(origin_df,"data/aspatial/rds/origin_df.rds")
write_rds(dest_df,"data/aspatial/rds/dest_df.rds")
origin_df <- read_rds("data/aspatial/rds/origin_df.rds")
dest_df <- read_rds("data/aspatial/rds/dest_df.rds")

Further Observation

Inspect the fields in origin_df and dest_df.

list(origin_df)
[[1]]
# A tibble: 28,000 × 12
# Groups:   trj_id [28,000]
   trj_id driving_mode osname  pingtimestamp       rawlat rawlng  speed bearing
   <chr>  <chr>        <chr>   <dttm>               <dbl>  <dbl>  <dbl>   <int>
 1 70895  car          android 2019-04-08 00:09:26   1.38   104.  9.95      111
 2 21926  car          android 2019-04-08 00:09:48   1.29   104. 11.0        75
 3 47498  car          ios     2019-04-08 00:09:50   1.38   104. 18.6       307
 4 18103  car          android 2019-04-08 00:09:55   1.45   104.  0.404     159
 5 41322  car          android 2019-04-08 00:09:57   1.28   104. 17.9       232
 6 64813  car          ios     2019-04-08 00:10:03   1.31   104. 17.1       106
 7 81518  car          ios     2019-04-08 00:10:14   1.31   104.  6.24      213
 8 66542  car          android 2019-04-08 00:11:17   1.36   104.  9.11      179
 9 25201  car          ios     2019-04-08 00:12:05   1.37   104. 12.0       211
10 82401  car          android 2019-04-08 00:12:11   1.30   104. 10.6       107
# ℹ 27,990 more rows
# ℹ 4 more variables: accuracy <dbl>, weekday <ord>, start_hr <fct>, day <fct>
list(dest_df)
[[1]]
# A tibble: 28,000 × 12
# Groups:   trj_id [28,000]
   trj_id driving_mode osname  pingtimestamp       rawlat rawlng  speed bearing
   <chr>  <chr>        <chr>   <dttm>               <dbl>  <dbl>  <dbl>   <int>
 1 81574  car          ios     2019-04-21 23:56:49   1.34   104. 15.3       103
 2 54687  car          android 2019-04-21 23:56:46   1.44   104.  8.15      299
 3 17190  car          android 2019-04-21 23:56:36   1.34   104. 12.4       202
 4 13793  car          android 2019-04-21 23:56:30   1.32   104.  6.47      170
 5 39014  car          ios     2019-04-21 23:56:27   1.33   104.  3.59      169
 6 41170  car          ios     2019-04-21 23:56:13   1.32   104. 13.1        71
 7 64519  car          ios     2019-04-21 23:55:49   1.43   104. 14.3       239
 8 70461  car          ios     2019-04-21 23:55:32   1.29   104.  0.970      51
 9 41154  car          ios     2019-04-21 23:55:10   1.32   104. 10.8       118
10 65488  car          ios     2019-04-21 23:54:47   1.38   104.  0         244
# ℹ 27,990 more rows
# ℹ 4 more variables: accuracy <dbl>, weekday <ord>, end_hr <fct>, day <fct>

Creating a Simple Feature Data Frame from an Aspatial Data Frame

Convert the data frames into simple features data frame such that the locations can be plotted.

Locations

Starting Locations

origin_sf <- origin_df %>% 
  st_as_sf(coords=c("rawlng","rawlat"),
           crs=4326) %>%
  st_transform(crs = 3414)

Ending Locations

dest_sf <- dest_df %>% 
  st_as_sf(coords=c("rawlng","rawlat"),
           crs=4326) %>%
  st_transform(crs = 3414)

Preparation for Spatial Point Patterns Analysis

Converting sf Format into spatstat’s ppp Format

Use as.ppp() of spatstat to convert the sf object ppp format.

Starting Locations

origin_ppp <- as.ppp(st_coordinates(origin_sf), st_bbox(origin_sf))
plot(origin_ppp)

Ending Locations

dest_ppp <- as.ppp(st_coordinates(dest_sf), st_bbox(dest_sf))
plot(dest_ppp)

Look at summary statistics.

Starting Locations

summary(origin_ppp)
Planar point pattern:  28000 points
Average intensity 2.473666e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
                    (46220 x 24490 units)
Window area = 1131920000 square units

Ending Locations

summary(dest_ppp)
Planar point pattern:  28000 points
Average intensity 2.493661e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [3637.21, 49870.63] x [25221.3, 49507.79] units
                    (46230 x 24290 units)
Window area = 1122850000 square units
any(duplicated(origin_ppp))
[1] FALSE
any(duplicated(dest_ppp))
[1] FALSE
  • With no duplicated starting locations nor ending locations, we can proceed to set up for our spatial point patterns analysis.

Creating owin object

sg_owin <- as.owin(sg_sf)
plot(sg_owin)

summary(sg_owin)
Window: polygonal boundary
56 separate polygons (36 holes)
                  vertices         area relative.area
polygon 1            15307  7.00834e+08      9.92e-01
polygon 2              285  1.61128e+06      2.28e-03
polygon 3               27  1.50315e+04      2.13e-05
polygon 4 (hole)        41 -4.01660e+04     -5.69e-05
polygon 5 (hole)       317 -5.11280e+04     -7.24e-05
polygon 6 (hole)         3 -4.14099e-04     -5.86e-13
polygon 7               30  2.80002e+04      3.97e-05
polygon 8 (hole)         4 -2.86396e-01     -4.06e-10
polygon 9 (hole)         3 -1.81439e-04     -2.57e-13
polygon 10 (hole)        3 -8.68789e-04     -1.23e-12
polygon 11 (hole)        3 -5.99535e-04     -8.49e-13
polygon 12 (hole)        3 -3.04561e-04     -4.31e-13
polygon 13 (hole)        3 -4.46076e-04     -6.32e-13
polygon 14 (hole)        3 -3.39794e-04     -4.81e-13
polygon 15 (hole)        3 -4.52043e-05     -6.40e-14
polygon 16 (hole)        3 -3.90173e-05     -5.53e-14
polygon 17 (hole)        3 -9.59850e-05     -1.36e-13
polygon 18 (hole)        4 -2.54488e-04     -3.60e-13
polygon 19 (hole)        4 -4.28453e-01     -6.07e-10
polygon 20 (hole)        4 -2.18616e-04     -3.10e-13
polygon 21 (hole)        5 -2.44411e-04     -3.46e-13
polygon 22 (hole)        5 -3.64686e-02     -5.16e-11
polygon 23              71  8.18750e+03      1.16e-05
polygon 24 (hole)        6 -8.37554e-01     -1.19e-09
polygon 25 (hole)       38 -7.79904e+03     -1.10e-05
polygon 26 (hole)        3 -3.41897e-05     -4.84e-14
polygon 27 (hole)        3 -3.65499e-03     -5.18e-12
polygon 28 (hole)        3 -4.95057e-02     -7.01e-11
polygon 29              91  1.49663e+04      2.12e-05
polygon 30 (hole)        3 -3.79135e-02     -5.37e-11
polygon 31 (hole)        5 -2.92235e-04     -4.14e-13
polygon 32 (hole)        3 -7.43616e-06     -1.05e-14
polygon 33 (hole)      270 -1.21455e+03     -1.72e-06
polygon 34 (hole)       19 -4.39650e+00     -6.23e-09
polygon 35 (hole)       35 -1.38385e+02     -1.96e-07
polygon 36 (hole)       23 -1.99656e+01     -2.83e-08
polygon 37              40  1.38607e+04      1.96e-05
polygon 38 (hole)       41 -6.00381e+03     -8.50e-06
polygon 39 (hole)        7 -1.40546e-01     -1.99e-10
polygon 40 (hole)       11 -8.36705e+01     -1.18e-07
polygon 41 (hole)        3 -2.33435e-03     -3.31e-12
polygon 42              45  2.51218e+03      3.56e-06
polygon 43             139  3.22293e+03      4.56e-06
polygon 44             148  3.10395e+03      4.40e-06
polygon 45 (hole)        4 -1.72650e-04     -2.44e-13
polygon 46              75  1.73526e+04      2.46e-05
polygon 47              83  5.28920e+03      7.49e-06
polygon 48             106  3.04104e+03      4.31e-06
polygon 49             264  1.50631e+06      2.13e-03
polygon 50              71  5.63061e+03      7.97e-06
polygon 51              10  1.99717e+02      2.83e-07
polygon 52 (hole)        3 -1.37223e-02     -1.94e-11
polygon 53             487  2.06117e+06      2.92e-03
polygon 54              65  8.42861e+04      1.19e-04
polygon 55              47  3.82087e+04      5.41e-05
polygon 56              22  6.74651e+03      9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
                     (53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46

Combining point events object and owin object

Extract points that are located within Singapore

Starting Locations

originSG_ppp = origin_ppp[sg_owin]

Ending Locations

destSG_ppp = dest_ppp[sg_owin]

The output object contains the combination of the point and polygon feature into one ppp object class.

Starting Locations

summary(originSG_ppp)
Planar point pattern:  28000 points
Average intensity 3.965129e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: polygonal boundary
56 separate polygons (36 holes)
                  vertices         area relative.area
polygon 1            15307  7.00834e+08      9.92e-01
polygon 2              285  1.61128e+06      2.28e-03
polygon 3               27  1.50315e+04      2.13e-05
polygon 4 (hole)        41 -4.01660e+04     -5.69e-05
polygon 5 (hole)       317 -5.11280e+04     -7.24e-05
polygon 6 (hole)         3 -4.14099e-04     -5.86e-13
polygon 7               30  2.80002e+04      3.97e-05
polygon 8 (hole)         4 -2.86396e-01     -4.06e-10
polygon 9 (hole)         3 -1.81439e-04     -2.57e-13
polygon 10 (hole)        3 -8.68789e-04     -1.23e-12
polygon 11 (hole)        3 -5.99535e-04     -8.49e-13
polygon 12 (hole)        3 -3.04561e-04     -4.31e-13
polygon 13 (hole)        3 -4.46076e-04     -6.32e-13
polygon 14 (hole)        3 -3.39794e-04     -4.81e-13
polygon 15 (hole)        3 -4.52043e-05     -6.40e-14
polygon 16 (hole)        3 -3.90173e-05     -5.53e-14
polygon 17 (hole)        3 -9.59850e-05     -1.36e-13
polygon 18 (hole)        4 -2.54488e-04     -3.60e-13
polygon 19 (hole)        4 -4.28453e-01     -6.07e-10
polygon 20 (hole)        4 -2.18616e-04     -3.10e-13
polygon 21 (hole)        5 -2.44411e-04     -3.46e-13
polygon 22 (hole)        5 -3.64686e-02     -5.16e-11
polygon 23              71  8.18750e+03      1.16e-05
polygon 24 (hole)        6 -8.37554e-01     -1.19e-09
polygon 25 (hole)       38 -7.79904e+03     -1.10e-05
polygon 26 (hole)        3 -3.41897e-05     -4.84e-14
polygon 27 (hole)        3 -3.65499e-03     -5.18e-12
polygon 28 (hole)        3 -4.95057e-02     -7.01e-11
polygon 29              91  1.49663e+04      2.12e-05
polygon 30 (hole)        3 -3.79135e-02     -5.37e-11
polygon 31 (hole)        5 -2.92235e-04     -4.14e-13
polygon 32 (hole)        3 -7.43616e-06     -1.05e-14
polygon 33 (hole)      270 -1.21455e+03     -1.72e-06
polygon 34 (hole)       19 -4.39650e+00     -6.23e-09
polygon 35 (hole)       35 -1.38385e+02     -1.96e-07
polygon 36 (hole)       23 -1.99656e+01     -2.83e-08
polygon 37              40  1.38607e+04      1.96e-05
polygon 38 (hole)       41 -6.00381e+03     -8.50e-06
polygon 39 (hole)        7 -1.40546e-01     -1.99e-10
polygon 40 (hole)       11 -8.36705e+01     -1.18e-07
polygon 41 (hole)        3 -2.33435e-03     -3.31e-12
polygon 42              45  2.51218e+03      3.56e-06
polygon 43             139  3.22293e+03      4.56e-06
polygon 44             148  3.10395e+03      4.40e-06
polygon 45 (hole)        4 -1.72650e-04     -2.44e-13
polygon 46              75  1.73526e+04      2.46e-05
polygon 47              83  5.28920e+03      7.49e-06
polygon 48             106  3.04104e+03      4.31e-06
polygon 49             264  1.50631e+06      2.13e-03
polygon 50              71  5.63061e+03      7.97e-06
polygon 51              10  1.99717e+02      2.83e-07
polygon 52 (hole)        3 -1.37223e-02     -1.94e-11
polygon 53             487  2.06117e+06      2.92e-03
polygon 54              65  8.42861e+04      1.19e-04
polygon 55              47  3.82087e+04      5.41e-05
polygon 56              22  6.74651e+03      9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
                     (53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46

Ending Locations

summary(destSG_ppp)
Planar point pattern:  27997 points
Average intensity 3.964704e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: polygonal boundary
56 separate polygons (36 holes)
                  vertices         area relative.area
polygon 1            15307  7.00834e+08      9.92e-01
polygon 2              285  1.61128e+06      2.28e-03
polygon 3               27  1.50315e+04      2.13e-05
polygon 4 (hole)        41 -4.01660e+04     -5.69e-05
polygon 5 (hole)       317 -5.11280e+04     -7.24e-05
polygon 6 (hole)         3 -4.14099e-04     -5.86e-13
polygon 7               30  2.80002e+04      3.97e-05
polygon 8 (hole)         4 -2.86396e-01     -4.06e-10
polygon 9 (hole)         3 -1.81439e-04     -2.57e-13
polygon 10 (hole)        3 -8.68789e-04     -1.23e-12
polygon 11 (hole)        3 -5.99535e-04     -8.49e-13
polygon 12 (hole)        3 -3.04561e-04     -4.31e-13
polygon 13 (hole)        3 -4.46076e-04     -6.32e-13
polygon 14 (hole)        3 -3.39794e-04     -4.81e-13
polygon 15 (hole)        3 -4.52043e-05     -6.40e-14
polygon 16 (hole)        3 -3.90173e-05     -5.53e-14
polygon 17 (hole)        3 -9.59850e-05     -1.36e-13
polygon 18 (hole)        4 -2.54488e-04     -3.60e-13
polygon 19 (hole)        4 -4.28453e-01     -6.07e-10
polygon 20 (hole)        4 -2.18616e-04     -3.10e-13
polygon 21 (hole)        5 -2.44411e-04     -3.46e-13
polygon 22 (hole)        5 -3.64686e-02     -5.16e-11
polygon 23              71  8.18750e+03      1.16e-05
polygon 24 (hole)        6 -8.37554e-01     -1.19e-09
polygon 25 (hole)       38 -7.79904e+03     -1.10e-05
polygon 26 (hole)        3 -3.41897e-05     -4.84e-14
polygon 27 (hole)        3 -3.65499e-03     -5.18e-12
polygon 28 (hole)        3 -4.95057e-02     -7.01e-11
polygon 29              91  1.49663e+04      2.12e-05
polygon 30 (hole)        3 -3.79135e-02     -5.37e-11
polygon 31 (hole)        5 -2.92235e-04     -4.14e-13
polygon 32 (hole)        3 -7.43616e-06     -1.05e-14
polygon 33 (hole)      270 -1.21455e+03     -1.72e-06
polygon 34 (hole)       19 -4.39650e+00     -6.23e-09
polygon 35 (hole)       35 -1.38385e+02     -1.96e-07
polygon 36 (hole)       23 -1.99656e+01     -2.83e-08
polygon 37              40  1.38607e+04      1.96e-05
polygon 38 (hole)       41 -6.00381e+03     -8.50e-06
polygon 39 (hole)        7 -1.40546e-01     -1.99e-10
polygon 40 (hole)       11 -8.36705e+01     -1.18e-07
polygon 41 (hole)        3 -2.33435e-03     -3.31e-12
polygon 42              45  2.51218e+03      3.56e-06
polygon 43             139  3.22293e+03      4.56e-06
polygon 44             148  3.10395e+03      4.40e-06
polygon 45 (hole)        4 -1.72650e-04     -2.44e-13
polygon 46              75  1.73526e+04      2.46e-05
polygon 47              83  5.28920e+03      7.49e-06
polygon 48             106  3.04104e+03      4.31e-06
polygon 49             264  1.50631e+06      2.13e-03
polygon 50              71  5.63061e+03      7.97e-06
polygon 51              10  1.99717e+02      2.83e-07
polygon 52 (hole)        3 -1.37223e-02     -1.94e-11
polygon 53             487  2.06117e+06      2.92e-03
polygon 54              65  8.42861e+04      1.19e-04
polygon 55              47  3.82087e+04      5.41e-05
polygon 56              22  6.74651e+03      9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
                     (53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46

Plot the before and after to compare the outputs Starting Locations

par(mfrow=c(1,2))
par(mar = c(3,3,2,1))
plot(origin_ppp)
plot(originSG_ppp)

Ending Locations

par(mfrow=c(1,2))
par(mar = c(3,3,2,1))
plot(dest_ppp)
plot(destSG_ppp)

Exploratory Data Analysis

From the raster images, it is difficult to determine which parts of Singapore had higher demand for pick-up and drop-off.

Additional Data Consolidation

Add a count column for starting and ending locations in mpsz3414

# form common identifier by passing SUBZONE_N and PLN_AREA_N over
joined_origin <- st_join(origin_sf, mpsz3414)
joined_dest <- st_join(dest_sf, mpsz3414)


# create count number by subzone
origin_counts <- joined_origin %>%
  group_by(SUBZONE_N) %>%
  summarise(total_origin = n())

dest_counts <- joined_dest %>%
  group_by(SUBZONE_N) %>%
  summarise(total_dest = n())


# insert the count columns into mpsz3414
mpsz3414 <- st_join(mpsz3414, origin_counts, by = "SUBZONE_N")

mpsz3414 <- st_join(mpsz3414, dest_counts, by = "SUBZONE_N")

# remove the duplicated subzone columns
mpsz3414 <- mpsz3414[, !names(mpsz3414) %in% c("SUBZONE_N.x", "SUBZONE_N.y")]

First let’s check for any null values.

any(is.na(mpsz3414$total_origin))
[1] TRUE
any(is.na(mpsz3414$total_dest))
[1] TRUE

Fill NA with zero using coalesce() from dplyr.

mpsz3414 <- mpsz3414 %>%
  mutate(total_origin = coalesce(total_origin, 0),
         prop_origin = total_origin / sum(total_origin))

mpsz3414 <- mpsz3414 %>%
  mutate(total_dest = coalesce(total_dest, 0),
         prop_dest = total_dest / sum(total_dest))
any(is.na(mpsz3414$total_origin))
[1] FALSE
any(is.na(mpsz3414$total_dest))
[1] FALSE
class(mpsz3414)
[1] "sf"         "data.frame"
  • Now there are no null values.

Since I have added critical data in mpsz3414, it is easier to plot the choropleth from the it. However, mpsz3414 is missing the geometry attribute. So let’s insert it from mpsz_sf.

# reorder columns for readability
mpsz3414 <- mpsz3414[c("SUBZONE_N", "PLN_AREA_N", "Name", "total_origin", "prop_origin", "total_dest", "prop_dest", "geometry")]

Choropleth Maps

Now, let’s have a general overview of the distribution of pick-up and drop-off points.

Starting Locations

tmap_options(check.and.fix = TRUE)

origin_choropleth <-
tm_shape(mpsz3414) +
  tm_fill("prop_origin",
          n=10,
          title="Proportion",
          style="equal",
          palette="YlOrRd") +
  tm_borders(lwd=0.2,
             alpha=1) +
  tm_layout(main.title = "Distribution of Pick-up Points",
            legend.outside=FALSE,
            main.title.size=1)

Ending Locations

dest_choropleth <-
tm_shape(mpsz3414) +
  tm_fill("prop_dest",
          n=10,
          title="Proportion",
          style="equal",
          palette="YlOrRd") +
  tm_borders(lwd=0.2,
             alpha=1) +
  tm_layout(main.title = "Distribution of Drop-off Points",
            legend.outside=FALSE,
            main.title.size=1) 

Comparing Starting and Ending Locations

tmap_mode("plot")
tmap_arrange(origin_choropleth, dest_choropleth, nrow=1)

  • At a quick glance, we can observe that there are the most pick-up and drop-off locations at the Changi Airport subzone. The quick inference is that there are many travellers commuting to and from the airport via Grab ride-hailing services.

Locations with top proportions

Sort prop_origin in descending order using desc().

origin_top <- mpsz3414 %>%
  arrange(desc(prop_origin)) %>%
  slice(1:10)

origin_top <- origin_top[, c("SUBZONE_N", "PLN_AREA_N", "total_origin", "prop_origin")]

origin_top
Simple feature collection with 10 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11411.94 ymin: 31874.91 xmax: 50271.73 ymax: 47996.47
Projected CRS: SVY21 / Singapore TM
         SUBZONE_N  PLN_AREA_N total_origin prop_origin
1   CHANGI AIRPORT      CHANGI         1456  0.05200000
2            XILIN    TAMPINES          660  0.02357143
3    TAMPINES EAST    TAMPINES          600  0.02142857
4   WOODLANDS EAST   WOODLANDS          474  0.01692857
5            SIMEI    TAMPINES          418  0.01492857
6           YUNNAN JURONG WEST          403  0.01439286
7   WOODLANDS WEST   WOODLANDS          367  0.01310714
8        WOODGROVE   WOODLANDS          362  0.01292857
9         ALJUNIED     GEYLANG          305  0.01089286
10 WOODLANDS SOUTH   WOODLANDS          305  0.01089286
                         geometry
1  MULTIPOLYGON (((45818.88 41...
2  MULTIPOLYGON (((44832.31 33...
3  MULTIPOLYGON (((42196.76 38...
4  MULTIPOLYGON (((24786.75 46...
5  MULTIPOLYGON (((42267.54 36...
6  MULTIPOLYGON (((12670.56 35...
7  MULTIPOLYGON (((21900.33 45...
8  MULTIPOLYGON (((22694.69 46...
9  MULTIPOLYGON (((34449.13 33...
10 MULTIPOLYGON (((24091.47 46...

Sort prop_dest in descending order using desc().

dest_top <- mpsz3414 %>%
  arrange(desc(prop_dest)) %>%
  slice(1:10) 

dest_top <- dest_top[, c("SUBZONE_N", "PLN_AREA_N", "total_dest", "prop_dest")]

dest_top
Simple feature collection with 10 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 20830.46 ymin: 31874.91 xmax: 50271.73 ymax: 46337.37
Projected CRS: SVY21 / Singapore TM
                 SUBZONE_N              PLN_AREA_N total_dest  prop_dest
1           CHANGI AIRPORT                  CHANGI       1633 0.05832768
2            TAMPINES EAST                TAMPINES        496 0.01771618
3                    XILIN                TAMPINES        409 0.01460871
4  CENTRAL WATER CATCHMENT CENTRAL WATER CATCHMENT        406 0.01450155
5                WOODGROVE               WOODLANDS        350 0.01250134
6             KAMPONG JAVA                 KALLANG        340 0.01214416
7               SWISS CLUB             BUKIT TIMAH        330 0.01178698
8                 ALJUNIED                 GEYLANG        326 0.01164410
9                BENDEMEER                 KALLANG        296 0.01057256
10               HILLCREST             BUKIT TIMAH        296 0.01057256
                         geometry
1  MULTIPOLYGON (((45818.88 41...
2  MULTIPOLYGON (((42196.76 38...
3  MULTIPOLYGON (((44832.31 33...
4  MULTIPOLYGON (((25073.29 43...
5  MULTIPOLYGON (((22694.69 46...
6  MULTIPOLYGON (((30089.09 32...
7  MULTIPOLYGON (((24031.39 36...
8  MULTIPOLYGON (((34449.13 33...
9  MULTIPOLYGON (((31277.37 34...
10 MULTIPOLYGON (((25392.54 35...
  • After Changi, planning areas like Tampines- and Woodlands-related subzones are in the top 10 for the proportions of pick-up and drop-off locations.

  • Interestingly, Central Water Catchment planning area is one of the top few drop-off locations.

Tampines

The Tampines planning area has piqued my interest. Let’s have a closer look.

tampines <- mpsz3414[mpsz3414$PLN_AREA_N == 'TAMPINES', ]

roads_tampines <- sg_roads[st_intersection(sg_roads, tampines), ]

origin_tampines <- joined_origin[st_intersection(joined_origin, tampines), ]


tmap_mode("view")

tm_basemap("OpenStreetMap")+
  tm_shape(tampines) +
    tm_borders(lwd=0.5) +
  tm_shape(roads_tampines) +
    tm_lines() +
  tm_shape(origin_tampines) +
    tm_dots()
tmap_mode("plot")

Woodlands

woodlands <- mpsz3414[mpsz3414$PLN_AREA_N == 'WOODLANDS', ]

roads_woodlands <- sg_roads[st_intersection(sg_roads, woodlands), ]

origin_woodlands <- joined_origin[st_intersection(joined_origin, woodlands), ]


tmap_mode("view")

tm_basemap("OpenStreetMap")+
  tm_shape(woodlands) +
    tm_borders(lwd=0.5) +
  tm_shape(roads_woodlands) +
    tm_lines() +
  tm_shape(origin_woodlands) +
    tm_dots()
tmap_mode("plot")

Central Water Catchment (CWC)

Since CWC area popped up as one of the top ten drop off locations, we should specify joined dest.

cwc <- mpsz3414[mpsz3414$PLN_AREA_N == 'CENTRAL WATER CATCHMENT', ]

roads_cwc <- sg_roads[st_intersection(sg_roads, cwc), ]

dest_cwc <- joined_dest[st_intersection(joined_dest, cwc), ]


tmap_mode("view")


tm_basemap("OpenStreetMap")+
  tm_shape(cwc) +
    tm_borders(lwd=0.5) +
  tm_shape(roads_cwc) +
    tm_lines() +
  tm_shape(dest_cwc) +
    tm_dots()
Interesting

Many of the drop-off points are along the parameters of the CWC planning area!

tmap_mode("plot")

Downtown Core

Let’s explore other areas like CBD as my assumption is that office workers might use Grab to travel to their workplaces. From prior knowledge, CBD is within the planning area of Downtown Core.

Starting Locations

downtown_core <- mpsz3414[mpsz3414$PLN_AREA_N == 'DOWNTOWN CORE', ]

roads_downtown_core <- sg_roads[st_intersection(sg_roads, downtown_core), ]

origin_downtown_core <- joined_origin[st_intersection(joined_origin, downtown_core), ]


tmap_mode("view")


tm_basemap("OpenStreetMap")+
  tm_shape(downtown_core) +
    tm_borders(lwd=0.5) +
  tm_shape(roads_downtown_core) +
    tm_lines() +
  tm_shape(origin_downtown_core) +
    tm_dots()
Interesting

Compared to the sparse distribution of points previously for CWC, the pick-up points seem to be distributed all over the roads in CBD.

tmap_mode("plot")

Kernel Density Estimation

Comparing Spatial Point Patterns using KDE

Here, we shall compare the kernel density estimation of pick-up and drop-off points at the pre-selected planning areas.

Create owin object

Convert the SpatialPolygons objects into owin objects as required by spatstat.

tm_owin <- as.owin(tampines)
wd_owin <- as.owin(woodlands)
dt_owin <- as.owin(downtown_core)
cwc_owin <- as.owin(cwc)

Combining points and the interested areas

Extract the pick-up and drop-off points within each of the study planning areas.

Interested Starting Locations

origin_tm_ppp = origin_ppp[tm_owin]
origin_wd_ppp = origin_ppp[wd_owin]
origin_dt_ppp = origin_ppp[dt_owin]
origin
standardGeneric for "origin" defined from package "terra"

function (x, ...) 
standardGeneric("origin")
<bytecode: 0x0000029bda940a48>
<environment: 0x0000029bda949cc8>
Methods may be defined for arguments: x
Use  showMethods(origin)  for currently available ones.
Note

Using origin_ppp or originSG_ppp does not matter as the restriction to the various planning area object windows will extract the necessary points.

Let’s use rescale() to transform the unit of measurement from metres to kilometres.

origin_tm_ppp.km = rescale(origin_tm_ppp, 1000, "km")
origin_wd_ppp.km = rescale(origin_wd_ppp, 1000, "km")
origin_dt_ppp.km = rescale(origin_dt_ppp, 1000, "km")

Then look at the output.

par(mfrow=c(2,2))
par(mar = c(3,3,2,1))
plot(origin_tm_ppp.km, main= "Origin Tampines")
plot(origin_wd_ppp.km, main= "Origin Woodlands")
plot(origin_dt_ppp.km, main= "Origin Downtown Core")

Repeat the same steps for the interested ending locations!

Interested Ending Locations

dest_tm_ppp = dest_ppp[tm_owin]
dest_wd_ppp = dest_ppp[wd_owin]
dest_dt_ppp = dest_ppp[dt_owin]
dest_cwc_ppp = dest_ppp[cwc_owin]

Let’s use rescale() to transform the unit of measurement from metres to kilometres.

dest_tm_ppp.km = rescale(dest_tm_ppp, 1000, "km")
dest_wd_ppp.km = rescale(dest_wd_ppp, 1000, "km")
dest_dt_ppp.km = rescale(dest_dt_ppp, 1000, "km")
dest_cwc_ppp.km = rescale(dest_cwc_ppp, 1000, "km")
par(mfrow=c(2,2))
par(mar = c(3,3,2,1))
plot(dest_tm_ppp, main= "Dest Tampines")
plot(dest_wd_ppp, main= "Dest Woodlands")
plot(dest_dt_ppp, main= "Dest Downtown Core")
plot(dest_cwc_ppp, main="Dest CWC")

Bandwidth Methods

There are multiple methods in determining the bandwidth type to use for kernel density. Here, let’s experiment with the different bandwidth methods before proceeding further.

Automatic Bandwidth Selection

The purpose for this project is to visualise where the pick-up and drop-off points are clustered. Let’s observe the cluster visualisations produced by the various bandwidth functions.

Interested Starting Locations

Tampines

bw.CvL
plot(density(origin_tm_ppp.km,
             sigma=bw.CvL,
             edge=TRUE,
             kernel="gaussian"),
             main="Tamp Origin CvL")

bw.scott
plot(density(origin_tm_ppp.km,
             sigma=bw.scott,
             edge=TRUE,
             kernel="gaussian"),
             main="Tamp Origin scott")

bw.diggle
plot(density(origin_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
             main="Tamp Origin diggle")

bw.ppl
plot(density(origin_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
             main="Tamp Origin ppl")

  • From the visualisations of Tampines, let’s go ahead with bw.diggle for the Tampines starting locations.

Woodlands

plot(density(origin_wd_ppp.km,
             sigma=bw.CvL,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Origin CvL")

plot(density(origin_wd_ppp.km,
             sigma=bw.scott,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Origin scott")

plot(density(origin_wd_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Origin diggle")

plot(density(origin_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Origin ppl")

  • The bw.ppl method can be used as it shows better visualisation of predominant tight clusters in Woodlands.

Downtown Core

plot(density(origin_dt_ppp.km,
        sigma=bw.CvL,
        edge=TRUE,
        kernel="gaussian"),
        main="Downtown Origin CvL")

plot(density(origin_dt_ppp.km,
        sigma=bw.scott,
        edge=TRUE,
        kernel="gaussian"),
        main="Downtown Origin scott")

plot(density(origin_dt_ppp.km,
        sigma=bw.diggle,
        edge=TRUE,
        kernel="gaussian"),
        main="Downtown Origin")

plot(density(origin_dt_ppp.km,
        sigma=bw.ppl,
        edge=TRUE,
        kernel="gaussian"),
        main="Downtown Origin ppl")

  • The bw.ppl method also seems to produce the best visualisation for CBD Downtown Core.

Interested Ending Locations

Tampines

plot(density(dest_tm_ppp.km,
             sigma=bw.CvL,
             edge=TRUE,
             kernel="gaussian"),
             main="Tampines Dest CvL")

plot(density(dest_tm_ppp.km,
             sigma=bw.scott,
             edge=TRUE,
             kernel="gaussian"),
             main="Tampines Dest scott")

plot(density(dest_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
             main="Tampines Dest diggle")

plot(density(dest_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
             main="Tampines Dest ppl")

  • In my opinion, the bw.ppl method showcases tighter clusters better than bw.diggle. Hence the bw.diggle method will be used for the Tampines drop-off locations.

Woodlands

plot(density(dest_wd_ppp.km,
             sigma=bw.CvL,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Dest CvL")

plot(density(dest_wd_ppp.km,
             sigma=bw.scott,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Dest scott")

plot(density(dest_wd_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Dest diggle")

plot(density(dest_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
             main="Woodlands Dest ppl")

  • The bw.ppl method seems to work best for the Woodlands drop-off locations.

Central Water Catchment

plot(density(dest_cwc_ppp.km,
             sigma=bw.CvL,
             edge=TRUE,
             kernel="gaussian"),
             main="CWC Dest CvL")

plot(density(dest_cwc_ppp.km,
             sigma=bw.scott,
             edge=TRUE,
             kernel="gaussian"),
             main="CWC Dest scott")

plot(density(dest_cwc_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
             main="CWC Dest diggle")

plot(density(dest_cwc_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
             main="CWC Dest ppl")

  • Again, the bw.ppl method seems to work best for the Central Water Catchment drop-off points.

To summarise the automatic bandwidth selection process, the bw.ppl method is chosen for all areas of interest except for Tampines Pick-ups.

Baddeley et al. (2016) suggested the use of bw.ppl() algorithm because in their experience, the algorithm tends to produce the more appropriate values when the pattern consists predominantly tight clusters.

However, they also insist that if the purpose is to detect a single tight cluster in the midst of random noise then bw.diggle() is the best. Thus we can test for this using the Tampines Pick-ups.

Kernel Method

The different kernel methods are demonstrated below.

Interested Starting Locations

Tampines

plot(density(origin_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
     main="Gaussian")

plot(density(origin_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="epanechnikov"),
     main="Quadratic")

plot(density(origin_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="quartic"),
     main="Quartic")

plot(density(origin_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="disc"),
     main="Disc")

  • There are no significant differences observed across the Tampines Origin kernels, so we will use the default which is Gaussian.

Woodlands

plot(density(origin_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
     main="Gaussian")

plot(density(origin_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="epanechnikov"),
     main="Quadratic")

plot(density(origin_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="quartic"),
     main="Quartic")

plot(density(origin_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc"),
     main="Disc")

  • For Woodlands pick-ups, the disc kernel method seems best for visualisation.

Downtown Core (CBD)

:::{.panel-tabset}

Gaussian
plot(density(origin_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
     main="Gaussian")

Quadratic (Epanechnikov)
plot(density(origin_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="epanechnikov"),
     main="Quadratic")

Quartic
plot(density(origin_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="quartic"),
     main="Quartic")

Disc
plot(density(origin_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc"),
     main="Disc")

  • The visualisation outputs across the kernel methods are similar, hence we will adopt the default Gaussian kernel method for CBD pick-ups.

Interested Ending Locations

Tampines

plot(density(dest_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
     main="Gaussian")

plot(density(dest_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="epanechnikov"),
     main="Epanechnikov")

plot(density(dest_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="quartic"),
     main="Quartic")

plot(density(dest_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc"),
     main="Disc")

  • The visualisation outputs across the kernel methods are similar, hence we will adopt the default Gaussian kernel method for Tampines drop-offs.

Woodlands

plot(density(dest_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
     main="Gaussian")

plot(density(dest_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="epanechnikov"),
     main="Epanechnikov")

plot(density(dest_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="quartic"),
     main="Quartic")

plot(density(dest_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc"),
     main="Disc")

  • The visualisation outputs across the kernel methods are similar, hence we will adopt the default Gaussian kernel method for Woodlands drop-offs.

CWC

plot(density(dest_cwc_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
     main="Gaussian")

plot(density(dest_cwc_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="epanechnikov"),
     main="Epanechnikov")

plot(density(dest_cwc_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="quartic"),
     main="Quartic")

plot(density(dest_cwc_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc"),
     main="Disc")

  • The visualisation outputs across the kernel methods are similar, hence we will adopt the default Gaussian kernel method for Woodlands drop-offs.

Summary of Kernel Method Selections

A grand summary of what I have done so far:

Part Planning Area Automatic Bandwidth Kernel Method
Origin Tampines diggle Gaussian
Origin Woodlands ppl Disc
Origin Downtown Core ppl Gaussian
Destination Tampines ppl Gaussian
Destination Woodlands ppl Gaussian
Destination Central Water Catchment ppl Gaussian

Final Kernel Density Estimation Maps

kde_origin_tm.bw <-
density(origin_tm_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian")
plot(kde_origin_tm.bw)

kde_origin_wd.bw <- density(origin_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc")
plot(kde_origin_wd.bw)

kde_origin_dt.bw <-
density(origin_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian")
plot(kde_origin_dt.bw)

kde_dest_tm.bw <- density(dest_tm_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian")
plot(kde_dest_tm.bw)

kde_dest_wd.bw <- density(dest_wd_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian")
plot(kde_dest_wd.bw)

kde_dest_cwc.bw <- density(dest_cwc_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian")
plot(kde_dest_cwc.bw)

Converting Gridded Output into Raster

To display KDE maps on tmap, the maps neeed to be converted to rasters.

Converting to Spatial Grid Data Frame

First set the KDE maps into Spatial Grids.

gridded_kde_origin_tm_bw <- as(kde_origin_tm.bw, "SpatialGridDataFrame")

gridded_kde_origin_wd_bw <- as(kde_origin_wd.bw, "SpatialGridDataFrame")

gridded_kde_origin_dt_bw <- as(kde_origin_dt.bw, "SpatialGridDataFrame")

gridded_kde_dest_tm_bw <- as(kde_dest_tm.bw, "SpatialGridDataFrame")

gridded_kde_dest_wd_bw <- as(kde_dest_wd.bw, "SpatialGridDataFrame")

gridded_kde_dest_cwc_bw <- as(kde_dest_cwc.bw, "SpatialGridDataFrame")

View the gridded outputs.

spplot(gridded_kde_origin_tm_bw)

spplot(gridded_kde_origin_wd_bw)

spplot(gridded_kde_origin_dt_bw)

spplot(gridded_kde_dest_tm_bw)

spplot(gridded_kde_dest_wd_bw)

spplot(gridded_kde_dest_cwc_bw)

Rasterisation of Grid Outputs

Convert the gridded outputs into rasters.

kde_origin_tm_bw_raster <- raster(gridded_kde_origin_tm_bw)

kde_origin_wd_bw_raster <- raster(gridded_kde_origin_wd_bw)

kde_origin_dt_bw_raster <- raster(gridded_kde_origin_dt_bw)

kde_dest_tm_bw_raster <- raster(gridded_kde_dest_tm_bw)

kde_dest_wd_bw_raster <- raster(gridded_kde_dest_wd_bw)

kde_dest_cwc_bw_raster <- raster(gridded_kde_dest_cwc_bw)

Assigning Projection Systems

Now include the CRS information.

projection(kde_origin_tm_bw_raster) <- CRS("+init=EPSG:3414")

projection(kde_origin_wd_bw_raster) <- CRS("+init=EPSG:3414")

projection(kde_origin_dt_bw_raster) <- CRS("+init=EPSG:3414")

projection(kde_dest_tm_bw_raster) <- CRS("+init=EPSG:3414")

projection(kde_dest_wd_bw_raster) <- CRS("+init=EPSG:3414")

projection(kde_dest_cwc_bw_raster) <- CRS("+init=EPSG:3414")

Visualise tmap Outputs

tmap_mode("plot")
tm_shape(kde_origin_tm_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("left","bottom"),frame=FALSE)

tmap_mode("plot")
tm_shape(kde_origin_wd_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("right","top"),frame=FALSE)

tmap_mode("plot")
tm_shape(kde_origin_dt_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("right","bottom"),frame=FALSE)

tmap_mode("plot")
tm_shape(kde_dest_tm_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("left","bottom"),frame=FALSE)

tmap_mode("plot")
tm_shape(kde_dest_wd_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("left","top"),frame=FALSE)

tmap_mode("plot")
tm_shape(kde_dest_cwc_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("left","bottom"),frame=FALSE)

Network-Constrained Kernel Density Estimation (NKDE) & Analysis

Previously, I have already extracted out the roads for Tampines (roads_tampines), Woodlands (roads_woodlands), Downtown Core (roads_downtown_core) and Central Water Catchment (roads_cwc).

Segmentation & Snapping

To perform NKDE, events like pick-up and drop-off points need to be snapped onto the road network. This will enable the lines to gain the necessary spatial weights based on point events.

Snapped events are shown in green (Credit: Jeremy Gelb, 2024)

Create the line segments for all the road networks of our areas of interest.

Determining the maximum distance

Locals often walk up to 80 metres to hop onto their Grab vehicle.

Plot out the network densities

Tampines

Origin KDE

tmap_mode("plot")
tm_shape(kde_origin_tm_bw_raster) +
  tm_raster("v") +
  tm_layout(legend.position = c("left","bottom"),frame=FALSE)

Origin NKDE

unique_types <- unique(st_geometry_type(roads_tampines))

# debugging: filter roads_tampines to keep only linestrings
if ("LINESTRING" %in% unique_types) {
  roads_tampines <- st_cast(roads_tampines, "LINESTRING")
} else {
  # handle the case when no linestrings are found
  stop("No linestrings found in roads_tampines.")
}
lixels_tm <- lixelize_lines(roads_tampines,
                         80,
                         mindist=40)

samples_tm <- lines_center(lixels_tm)

densities_tm_origin <- nkde(roads_tampines, 
                  events = origin_tampines,
                  w = rep(1,nrow(origin_tampines)),
                  samples = samples_tm,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, #we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)

samples_tm$density_origin <- densities_tm_origin
lixels_tm$density_origin <- densities_tm_origin

samples_tm$density_origin <- samples_tm$density_origin*1000
lixels_tm$density_origin <- lixels_tm$density_origin*1000

tmap_mode("view")
tm_shape(lixels_tm)+
  tm_lines(col="density_origin")+
tm_shape(origin_tampines)+
  tm_dots(alpha=0.1) +
tm_basemap("OpenStreetMap")

Destination NKDE

dest_tampines <- joined_dest[st_intersection(joined_dest, tampines), ]

densities_tm_dest <- nkde(roads_tampines, 
                  events = dest_tampines,
                  w = rep(1,nrow(dest_tampines)),
                  samples = samples_tm,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, #we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)

samples_tm$density_dest <- densities_tm_dest
lixels_tm$density_dest <- densities_tm_dest

samples_tm$density_dest <- samples_tm$density_dest*1000
lixels_tm$density_dest <- lixels_tm$density_dest*1000

tmap_mode("view")
tm_shape(lixels_tm)+
  tm_lines(col="density_dest")+
tm_shape(origin_tampines)+
  tm_dots(alpha=0.1) +
tm_basemap("OpenStreetMap")
tmap_mode("plot")

Analysis of Tampines

Origin: KDE vs NKDE

Based on the traditional KDE results, we can generally comment that the densest cluster of pick-up points amounting up to 12,000 is concentrated at the north-eastern ridge of Tampines, which in particular lies in Xilin Subzone (Simei). However, this most significant finding from the KDE result overlooks at the insights from the NKDE results featuring the varying intensities of road utility in Tampines.

For the Tampines pick-ups, it seems that there are a few more intense road networks. First, there is an intensity value of up to 0.8 at the cross junction of Xilin Avenue and Upper Changi Road East (i.e. southwestern ridge of ITE College East), and at the T-junction intersecting Simei St 3 and Simei Avenue providing values (i.e. Southeast of Changi General Hospital) of up to 0.8. Furthermore, there is an intensity value of up to 0.6 at the cross-junction of Sompah Road and Upper Changi Road East that is adjacent to the southwestern corner of SUTD.

Idea

These highlighted areas can suggest that people leaving work or appointments utilise Grab services. We can try to verify this with Temporal Network Kernel Density Estimation later!

The other prominent location is Tampines Ave 2, along the same side as Blk 302, as it has the highest NKDE value of up to 1.0. Given the high concentration of activity along this bus stop road, it may be advisable for Grab users to choose a different pick-up location since this road seems congested with just Grab data alone. The actual NKDE of this road stretch may be significantly higher if we considered the buses.

Idea

An interesting fact to note is that this road is always congested during peak morning hours. I experience this first-hand on a regular basis while waiting for a long time for my bus at Block 302! I hypothesise that there are more Grab pick-ups in the morning than at night, so we may explore any temporal differences later!

NKDE: Origin vs Destination

A similarity across the pick-up and drop-offs is that the road along Blk 302 remains as one of the most intense and thus many Grab rides either start or end along this stretch. However, the intensity of drop-offs here is lower at 0.6, which shows that there is a comparatively lower drop-off occurrences here than pick-ups.

Nevertheless, the greatest NKDE value for drop-offs in Tampines is situated at the cross junction of Tampines Ave 2 and Simei Avenue, where neighbourhood hub spots like SAFRA Tampines and Tampines Round Market are at. When Grab users are going to Tampines, they could be intending to use recreational facilities or to eat at economical hawker food.

(TBC possibly with NKDE analysis of other locations and TKDE.)